ggbetweenstats(
data = iris,
x = Species,
y = Sepal.Length,
title = "Distribution of sepal length across Iris species"
)Indrajeet Patil
Why a new package?
(as an academic researcher)
Large-scale problems
“39% of effects were subjectively rated to have replicated the original result”1
“half of all published psychology papers contained at least one p-value that was inconsistent”2
“in 72% of cases, nonsignificant results were misinterpreted [to mean] that effect was absent”3
Personal challenges
Information-rich, ready-made statistical visualizations
💡 Graphical summaries reveal problems not discernible from numerical statistics!
The grammar of graphics is a powerful framework and can help you make any data visualization!
💡 Using ready-made plots lowers the activation energy for visualizing data!
{ggstatsplot} was born!
(started out as loose scripts, which were later consolidated into a package)
E.g., for hypothesis about group differences (ggbetweenstats())
Important
Information-rich defaults
Statistical approaches available
Appendix provides exhaustive details for all functions.
Does it deliver?
Without {ggstatsplot}
Pearson’s correlation test revealed that, across 142 participants, variable x was negatively correlated with variable y: \(t(140)=-0.76, p=0.446\). The effect size \((r=-0.06, 95\% CI [-0.23,0.10])\) was small, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data were 5.81 times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis (absence of any correlation between x and y).
With {ggstatsplot}
✅ No need to worry about reporting or interpretation errors!
✅ Shortens the time from data to insight by combining data visualization and statistical analysis in a single step!
Promotes mindless application of statistical tests.
{ggplot2} extension.1.0) release yet.Maybe the real treasure was the technical skills we picked up along the way!
Breaking down initial monolith (initially: \(>\) 20K LOC) into smaller, more manageable pieces (now: \(<\) 800 LOC).
flowchart TD
ggstatsplot[ggstatsplot]
statsExpressions[statsExpressions]
note["statistical computation engine"]
subgraph easystats[easystats ecosystem]
effectsize[effectsize]
insight[insight]
parameters[parameters]
performance[performance]
bayestestR[bayestestR]
end
%% subgraph other[Graphics Dependencies]
%% ggplot2[ggplot2]
%% ggrepel[ggrepel]
%% ggsignif[ggsignif]
%% end
%% Main dependencies
ggstatsplot --> statsExpressions
ggstatsplot --> dots[Other dependencies]
%% Add note connecting to the main relationship
note -.-> statsExpressions
%% statsExpressions dependencies on easystats packages
statsExpressions --> effectsize
statsExpressions --> insight
statsExpressions --> parameters
statsExpressions --> performance
statsExpressions --> bayestestR
%% Styling using colorblind-friendly colors
classDef main fill:#EE7733,stroke:#333,stroke-width:2px
classDef stats fill:#009988,stroke:#333,stroke-width:2px
classDef easy fill:#CCBB44,stroke:#333,stroke-width:1px
classDef other fill:#88CCEE,stroke:#333,stroke-width:1px
classDef note fill:#ffffff,stroke:#333,stroke-width:1px,stroke-dasharray: 5 5
class ggstatsplot main
class statsExpressions stats
class effectsize,insight,parameters,performance,bayestestR easy
class dots other
class note note
While re-architecting {ggstatsplot}, in the spirit of open-source, I decided to contribute to upstream dependencies:
{easystats} team and contributing to its ten component packages{ggsignif}{WRS2} and {ggcorrplot}CI Checks (GitHub Actions)
Healthy and active code base
While improving QA tools for {ggstatsplot}, in the spirit of open-source, I decided to contribute to upstream dependencies:
{lintr} (linter){styler} (formatter)Benefits of the {ggstatsplot} approach
{ggstatsplot} combines data visualization and statistical analysis in a single step.
It…
Source code for these slides can be found on GitHub.
If you are interested in good programming and software development practices, check out my other slide decks.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os Ubuntu 22.04.5 LTS
system x86_64, linux-gnu
hostname fv-az651-268
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz UTC
date 2024-11-12
pandoc 3.5 @ /opt/hostedtoolcache/pandoc/3.5/x64/ (via rmarkdown)
quarto 1.6.33 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
base * 4.4.2 2024-10-31 [3] local
BayesFactor 0.9.12-4.7 2024-01-24 [1] RSPM
bayestestR 0.15.0 2024-10-17 [1] RSPM
bitops 1.0-9 2024-10-03 [1] RSPM
BWStest 0.2.3 2023-10-10 [1] RSPM
cachem 1.1.0 2024-05-16 [1] RSPM
cli 3.6.3 2024-06-21 [1] RSPM
coda 0.19-4.1 2024-01-31 [1] RSPM
colorspace 2.1-1 2024-07-26 [1] RSPM
compiler 4.4.2 2024-10-31 [3] local
correlation 0.8.6 2024-10-26 [1] RSPM
cranlogs 2.1.1 2019-04-29 [1] RSPM
curl 6.0.0 2024-11-05 [1] RSPM
data.table 1.16.2 2024-10-10 [1] RSPM
datasets * 4.4.2 2024-10-31 [3] local
datawizard 0.13.0 2024-10-05 [1] RSPM
digest 0.6.37 2024-08-19 [1] RSPM
dplyr 1.1.4 2023-11-17 [1] RSPM
effectsize 0.8.9 2024-07-03 [1] RSPM
evaluate 1.0.1 2024-10-10 [1] RSPM
fansi 1.0.6 2023-12-08 [1] RSPM
farver 2.1.2 2024-05-13 [1] RSPM
fastmap 1.2.0 2024-05-15 [1] RSPM
generics 0.1.3 2022-07-05 [1] RSPM
ggplot2 * 3.5.1 2024-04-23 [1] RSPM
ggrepel 0.9.6 2024-09-07 [1] RSPM
ggsignif 0.6.4 2022-10-13 [1] RSPM
ggstatsplot * 0.12.5.9000 2024-11-12 [1] Github (IndrajeetPatil/ggstatsplot@450fa64)
glue 1.8.0 2024-09-30 [1] RSPM
gmp 0.7-5 2024-08-23 [1] RSPM
graphics * 4.4.2 2024-10-31 [3] local
grDevices * 4.4.2 2024-10-31 [3] local
grid 4.4.2 2024-10-31 [3] local
gtable 0.3.6 2024-10-25 [1] RSPM
htmltools 0.5.8.1 2024-04-04 [1] RSPM
httr 1.4.7 2023-08-15 [1] RSPM
insight 0.20.5 2024-10-02 [1] RSPM
jsonlite 1.8.9 2024-09-20 [1] RSPM
knitr 1.49 2024-11-08 [1] RSPM
kSamples 1.2-10 2023-10-07 [1] RSPM
labeling 0.4.3 2023-08-29 [1] RSPM
lattice 0.22-6 2024-03-20 [3] CRAN (R 4.4.2)
lifecycle 1.0.4 2023-11-07 [1] RSPM
lubridate 1.9.3 2023-09-27 [1] RSPM
magrittr 2.0.3 2022-03-30 [1] RSPM
MASS 7.3-61 2024-06-13 [3] CRAN (R 4.4.2)
Matrix 1.7-1 2024-10-18 [3] CRAN (R 4.4.2)
MatrixModels 0.5-3 2023-11-06 [1] RSPM
memoise 2.0.1 2021-11-26 [1] RSPM
methods * 4.4.2 2024-10-31 [3] local
mgcv 1.9-1 2023-12-21 [3] CRAN (R 4.4.2)
multcompView 0.1-10 2024-03-08 [1] RSPM
munsell 0.5.1 2024-04-01 [1] RSPM
mvtnorm 1.3-2 2024-11-04 [1] RSPM
nlme 3.1-166 2024-08-14 [3] CRAN (R 4.4.2)
packageRank * 0.9.3 2024-10-16 [1] RSPM
paletteer 1.6.0 2024-01-21 [1] RSPM
parallel 4.4.2 2024-10-31 [3] local
parameters 0.23.0 2024-10-18 [1] RSPM
patchwork 1.3.0 2024-09-16 [1] RSPM
pbapply 1.7-2 2023-06-27 [1] RSPM
performance 0.12.4 2024-10-18 [1] RSPM
pillar 1.9.0 2023-03-22 [1] RSPM
pkgconfig 2.0.3 2019-09-22 [1] RSPM
pkgsearch 3.1.3 2023-12-10 [1] RSPM
PMCMRplus 1.9.12 2024-09-08 [1] RSPM
prismatic 1.1.2 2024-04-10 [1] RSPM
purrr 1.0.2 2023-08-10 [1] RSPM
R.methodsS3 1.8.2 2022-06-13 [1] RSPM
R.oo 1.27.0 2024-11-01 [1] RSPM
R.utils 2.12.3 2023-11-18 [1] RSPM
R6 2.5.1 2021-08-19 [1] RSPM
Rcpp 1.0.13-1 2024-11-02 [1] RSPM
RCurl 1.98-1.16 2024-07-11 [1] RSPM
rematch2 2.1.2 2020-05-01 [1] RSPM
rlang 1.1.4 2024-06-04 [1] RSPM
rmarkdown 2.29 2024-11-04 [1] RSPM
Rmpfr 0.9-5 2024-01-21 [1] RSPM
scales 1.3.0 2023-11-28 [1] RSPM
sessioninfo 1.2.2.9000 2024-11-10 [1] Github (r-lib/sessioninfo@37c81af)
splines 4.4.2 2024-10-31 [3] local
stats * 4.4.2 2024-10-31 [3] local
statsExpressions 1.6.1 2024-10-31 [1] RSPM
stringi 1.8.4 2024-05-06 [1] RSPM
stringr 1.5.1 2023-11-14 [1] RSPM
sugrrants 0.2.9 2024-03-12 [1] RSPM
SuppDists 1.1-9.8 2024-09-03 [1] RSPM
tibble 3.2.1 2023-03-20 [1] RSPM
tidyr 1.3.1 2024-01-24 [1] RSPM
tidyselect 1.2.1 2024-03-11 [1] RSPM
timechange 0.3.0 2024-01-18 [1] RSPM
tools 4.4.2 2024-10-31 [3] local
utf8 1.2.4 2023-10-22 [1] RSPM
utils * 4.4.2 2024-10-31 [3] local
vctrs 0.6.5 2023-12-01 [1] RSPM
withr 3.0.2 2024-10-28 [1] RSPM
xfun 0.49 2024-10-31 [1] RSPM
yaml 2.3.10 2024-07-26 [1] RSPM
zeallot 0.1.0 2018-01-28 [1] RSPM
[1] /home/runner/work/_temp/Library
[2] /opt/R/4.4.2/lib/R/site-library
[3] /opt/R/4.4.2/lib/R/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────
ggwithinstats()Hypothesis about group differences: repeated measures design
Important
✏️ Defaults
Statistical approaches available
gghistostats()Distribution of a numeric variable
Important
✏️ Defaults
Statistical approaches available
ggdotplotstats()Labeled numeric variable
Important
✏️ Defaults
Statistical approaches available
ggscatterstats()Hypothesis about correlation: Two numeric variables
ggcorrmat()Hypothesis about correlation: Multiple numeric variables
ggpiestats()Hypothesis about composition of categorical variables
ggbarstats()Hypothesis about composition of categorical variables
ggcoefstats()Hypothesis about regression coefficients
Important
✏️ Defaults
Supports all regression models supported in {easystats} ecosystem.
Meta-analysis is also supported!
Iterating over a grouping variable
“What if I don’t like the default plots?” 🤔
{ggstatsplot}: Details about statistical reportingNote
| Functions | Description | Parametric | Non-parametric | Robust | Bayesian |
|---|---|---|---|---|---|
ggbetweenstats() |
Between group comparisons | ✅ | ✅ | ✅ | ✅ |
ggwithinstats() |
Within group comparisons | ✅ | ✅ | ✅ | ✅ |
gghistostats(), ggdotplotstats() |
Distribution of a numeric variable | ✅ | ✅ | ✅ | ✅ |
ggcorrmat() |
Correlation matrix | ✅ | ✅ | ✅ | ✅ |
ggscatterstats() |
Correlation between two variables | ✅ | ✅ | ✅ | ✅ |
ggpiestats(), ggbarstats() |
Association between categorical variables | ✅ | NA |
NA |
✅ |
ggpiestats(), ggbarstats() |
Equal proportions for categorical variable levels | ✅ | NA |
NA |
✅ |
ggcoefstats() |
Regression modeling | ✅ | ✅ | ✅ | ✅ |
ggcoefstats() |
Random-effects meta-analysis | ✅ | NA |
✅ | ✅ |
Parametric
Hunting for packages
📦 for inferential statistics ({stats})
📦 computing effect size + CIs ({effectsize})
📦 for descriptive statistics ({skimr})
📦 pairwise comparisons ({multcomp})
📦 Bayesian hypothesis testing ({BayesFactor})
📦 Bayesian estimation ({bayestestR})
📦 …
Inconsistent APIs
🤔 accepts data frame, vector, matrix?
🤔 long/wide format data?
🤔 works with NAs?
🤔 returns data frame, vector, matrix?
🤔 works with tibbles?
🤔 has all necessary details?
🤔 …